Mining Discourse Markers For Chinese Textual Summarization
نویسندگان
چکیده
Discourse markers foreshadow the message thrust of texts and saliently guide their rhetorical structure which are important for content filtering and text abstraction. This paper reports on efforts to automatically identify and classify discourse markers in Chinese texts using heuristic-based and corpus-based data-mining methods, as an integral part of automatic text summarization via rhetorical structure and Discourse Markers. Encouraging results are reported.
منابع مشابه
Enhancement Of A Chinese Discourse Marker Tagger With C4.5
Discourse markers are complex discontinuous linguistic expressions which are used to explicitly signal the discourse structure of a text. This paper describes efforts to improve an automatic tagging system which identifies and classifies discourse markers in Chinese texts by applying machine learning (ML) to the disambiguation of discourse markers, as an integral part of automatic text summariz...
متن کاملشناسائی رابطه تقابل در گفتمان فارسی به کمک روش های یادگیری باسرپرستی
Discourse is a part of language that intend is used to communicate. A discourse relation recognition system can identify one or more relation between the textual units in a discourse. Like other languages, Contrast relation is a one of the available relations in Persian discourse. Contrast relation recognition in discourse is useful for generation and perception of discourse, paraphrasing and ...
متن کاملDiscourse Automatic Annotation of Texts: An Application to Summarization
The exploitation of the discourse structure of a text and the identification of the discourse categories are essential elements for the automatic summarization, as well as for the textual information retrieval. In this paper we will describe an automatic summarization strategy that uses these elements as the basis for the extraction of the most relevant textual segments that will constitute the...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملA corpus-driven approach to discourse organisation: from cues to complex markers
This paper reports on an experiment implementing a data-intensive approach to discourse organisation. Its focus is on enumerative structures envisaged as a type of textual pattern in a sequentiality-oriented approach to discourse. On the basis of a large-scale annotation exercise calling upon automatic feature mark-up alongside manual annotation, we explore a method to identify complex discours...
متن کامل